Research in Text Processing: Creating Robust and Portable Systems

نویسنده

  • Ralph Grishman
چکیده

Objective Our goal is to improve the technology for retrieving passages , extracting specific facts, and creating formatted data bases from large text collections. In particular, we are concerned with developing techniques for automatically training language processing systems to the syntax and semantics of particular domains and types of text in order to improve system performance. Approach Improving the natural language technology for information extraction and retrieval will require • increased knowledge of specialized language usage and domain semantics • tools for acquiring this knowledge • analysis mechanisms which can cope with gaps in this knowledge In natural language text, much of the information is implicit and much of it, viewed in isolation, is ambiguous. Increased information about syntactic usage, discourse patterns, and the semantics of particular domains is essential to resolve this ambiguity and extract the intended facts from the text. However, collecting this information manually for each type of text is difficult and time-consuming, and renders the system non-portable. It is therefore desirable to be able to extract such characteristics as the relative preference for different syntactic structures and the semantic classes and constraints automatically from a sample of text in a particular domain. Since the text samples are finite, this information will always be incomplete. In addition, any real text will contain typographical and syntactic errors and semantic relations outside the principal domain. In consequence, a high-performance system will require a forgiving analysis procedure which tries to minimize constraint violations but does not insist on a "perfect" input. To guide and evaluate our work on the underlying technologies, we have developed three message processing applications over the past five years. The first was for CASREPs-equipment failure messages. The focus for this system was on deep domain models for language understanding, and in particular for the determination of the implicit causal and temporal relations between events in a narrative. The other systems involved RAINFORMs and OPREPs-messages describing naval encounters and engagements. These systems were developed for the Message Understanding Conferences organized by the Naval Ocean Systems Center. The focus for these systems was on robustness: the ability to extract at least partial information despite violations of syntactic or semantic constraints. Since the last Message Understanding Conference (in June 1989) we have analyzed some of the factors contributing to the performance of our system. We reported at the last DARPA Workshop on the crucial role played by preference semantics …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

UMASS/HUGHES: Description Of The Circus System Used For Tipster Text

The primary goal of our effort is the development of robust and portable language processing capabilities for information extraction appfications. The system under evaluation here is based on language processing components that have demonstrated strong performance capabilities in previous evaluations [Lehnert et al. 1992a]. Having demonstrated the general viability of these techniques, we are n...

متن کامل

UMass/Hughes TIPSTER Project on Extraction from Text

The primary goal of our effort is the development of robust and portable language processing capabilities and information extraction applications. Our system is based on a sentence analysis technique called selective concept extraction. Having demonstrated the general viability of this technique in previous evaluations [Lehnert, et al. 1992], we are now concentrating on the practicality of our ...

متن کامل

Vibrotactile Identification of Signal-Processed Sounds from Environmental Events Presented by a Portable Vibrator: A Laboratory Study

Objectives: To evaluate different signal-processing algorithms for tactile identification of environmental sounds in a monitoring aid for the deafblind. Two men and three women, sensorineurally deaf or profoundly hearing impaired with experience of vibratory experiments, age 22-36 years. Methods: A closed set of 45 representative environmental sounds were processed using two transposing (TRH...

متن کامل

Resource Report: Building Parallel Text Corpora for Multi-Domain Translation System

Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. However, manual translations are very costly, and the number of known parallel text is limited. Hence, our research started with creating and collecting a large amount of parallel text resources for Indonesian-English. We describe in this paper the creation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990